Search CORE

5 research outputs found

SNPdetector: A Software Tool for Sensitive and Accurate SNP Detection

Author: David A Wheeler
Gabor Marth
Imtiaz Yakub
Jinghui Zhang
Kenneth H Buetow
Paul P Liu
Raman Sood
Richard A Gibbs
Sharon Wei
The Encode Consortium
The International HapMap Consortium
William Rowe
Publication venue: Public Library of Science
Publication date: 01/10/2005
Field of study

Identification of single nucleotide polymorphisms (SNPs) and mutations is important for the discovery of genetic predisposition to complex diseases. PCR resequencing is the method of choice for de novo SNP discovery. However, manual curation of putative SNPs has been a major bottleneck in the application of this method to high-throughput screening. Therefore it is critical to develop a more sensitive and accurate computational method for automated SNP detection. We developed a software tool, SNPdetector, for automated identification of SNPs and mutations in fluorescence-based resequencing reads. SNPdetector was designed to model the process of human visual inspection and has a very low false positive and false negative rate. We demonstrate the superior performance of SNPdetector in SNP and mutation analysis by comparing its results with those derived by human inspection, PolyPhred (a popular SNP detection tool), and independent genotype assays in three large-scale investigations. The first study identified and validated inter- and intra-subspecies variations in 4,650 traces of 25 inbred mouse strains that belong to either the Mus musculus species or the M. spretus species. Unexpected heterozgyosity in CAST/Ei strain was observed in two out of 1,167 mouse SNPs. The second study identified 11,241 candidate SNPs in five ENCODE regions of the human genome covering 2.5 Mb of genomic sequence. Approximately 50% of the candidate SNPs were selected for experimental genotyping; the validation rate exceeded 95%. The third study detected ENU-induced mutations (at 0.04% allele frequency) in 64,896 traces of 1,236 zebra fish. Our analysis of three large and diverse test datasets demonstrated that SNPdetector is an effective tool for genome-scale research and for large-sample clinical studies. SNPdetector runs on Unix/Linux platform and is available publicly (http://lpg.nci.nih.gov)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Rejected and Accepted Bases in a Sequence Trace

Author: David A Wheeler (1478)
Imtiaz Yakub (12014)
Jinghui Zhang (12013)
Kenneth H Buetow (12019)
Paul P Liu (12018)
Raman Sood (12016)
Richard A Gibbs (11034)
Sharon Wei (12015)
William Rowe (12017)
Publication venue
Publication date
Field of study

<div><p>The Phred quality scores are indicated at the top. The quality scores for rejected bases are labeled in red. Accepted bases are marked by rectangular boxes.</p><p>(A) A subregion of polyA bubble showing that low-quality bases with no secondary peaks are accepted by SNPdetector.</p><p>(B) A subregion showing that a Q20 base is rejected because of its high secondary peak even though the majority of neighboring bases have high-quality scores.</p></div

FigShare

Sequence Traces of a SNP Cluster with Three Consecutive SNPs

Author: David A Wheeler (1478)
Imtiaz Yakub (12014)
Jinghui Zhang (12013)
Kenneth H Buetow (12019)
Paul P Liu (12018)
Raman Sood (12016)
Richard A Gibbs (11034)
Sharon Wei (12015)
William Rowe (12017)
Publication venue
Publication date
Field of study

<p>The top is a homozygous sample and the bottom a heterozygous one. The Phred quality score is labeled on top of each base. In the heterozygous sample, the three HQDPs around the three heterozygotes are labeled with red lines at the bottom. The flanking bases used for calculating genotype quality class of the highlighted heterozygote in the middle are marked by rectangular boxes, which do not include any HQDPs. The flanking bases used to assess background noise in the flanking region are labeled with brackets at the bottom.</p

FigShare

Schematic Diagram of the Principal Steps in the Analysis of Sequencing Variants Found by SNPdetector

Author: David A Wheeler (1478)
Imtiaz Yakub (12014)
Jinghui Zhang (12013)
Kenneth H Buetow (12019)
Paul P Liu (12018)
Raman Sood (12016)
Richard A Gibbs (11034)
Sharon Wei (12015)
William Rowe (12017)
Publication venue
Publication date
Field of study

<p>Paralellograms are analytical modules (usually C programs), and rectangles are input and output data. Programs obtained from the public domain are displayed in italics while those developed in this work are shown in bold. SNPdetector requires the following three sets of input data: (1) a template sequence file, (2) the forward and the reverse sequencing primers, and (3) the trace files. The output includes a list of high-quality SNPs and their genotype calls in each subject.</p

FigShare